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Abstract — In this paper, we investigate the topology conver- 
gence problem for the gossip-based Gradient overlay network. 
In an overlay network where each node has a local utility 
value, a Gradient overlay network is characterized by the 
properties that each node has a set of neighbors with the 
same utility value (a similar view) and a set of neighbors 
containing higher utility values (gradient neighbor set), such 
that paths of increasing utilities emerge in the network topology. 
The Gradient overlay network is built using gossiping and a 
preference function that samples from nodes using a uniform 
random peer sampling service. We analyze it using tools from 
matrix analysis, and we prove both the necessary and sufficient 
conditions for convergence to a complete gradient structure, as 
well as estimating the convergence time and providing bounds 
on worst-case convergence time. Finally, we show in simulations 
the potential of the Gradient overlay, by building a more 
efficient live-streaming peer-to-peer (P2P) system than one built 
using uniform random peer sampling. 

Keywords: Overlay networks; topology convergence; gos- 
siping; gradient topology 

I. Introduction 

Recent years have witnessed growing interest in using ran- 
domized gossiping algorithms to build distributed systems, in 
particular in the areas of overlay networks, sensor networks 
and cloud computing storage services [1], [2]. Gossip-based, 
or pair-wise exchange, algorithms have primarily been used 
to implement aggregation algorithms, information dissemi- 
nation, peer sampling (the uniform random sampling of a 
node from the set of all nodes in a P2P system), and to 
construct overlay network topologies. Much of the existing 
analysis of gossip-based algorithms has focused on the 
convergence properties of aggregation algorithms and peer 
sampling services, for both fixed topologies [3] and regular 
graphs [4], [5]. 

However, research in gossiping has also focused on using 
the Preferential Connectivity Model [6] to construct overlay 
network topologies, where nodes connected initially in a 
random graph use a preferential connection function to break 
the symmetry of the random graph and build a topology that 
contains useful global information. Barabasi first described 
how a preferential attachment function in a growing network 
can build a scale-free network topology from a random graph 
[7]. In particular, he showed how the power-law distribution 
of links in the the World Wide Web can emerge when arriving 
nodes preferentially attach to existing nodes with higher edge 
degree. Information about the structure of the Web's topology 
is currently used, among other things, to build more efficient 
search algorithms. Barabasi's preferential attachment func- 
tions are based on global state (the in-degree of nodes). How- 



ever, in overlay networks, nodes have only a relatively small 
partial view of the system, so preference functions are based 
only on local state and the state of the node's neighbors. 
Examples of existing overlay networks that construct their 
topologies using gossiping and preference functions include 
Spotify, that preferentially connects nodes with similar music 
play-lists [8], Sepidar, that preferentially connects P2P live- 
streaming nodes with similar upload bandwidth capacity [9], 
and T-Man, a framework that provides a generic preference 
function for building such overlays [10]. 

To the best of our knowledge, there has been no analysis 
of the convergence properties of such information-carrying 
gossip-generated topologies built using preference functions. 
These systems, however, do not require the growth of a net- 
work to construct a new topology, as systems are constantly 
updated using a peer sampling service. In this paper, we 
introduce an analysis of the convergence properties for the 
Gradient overlay network. The Gradient topology belongs 
to this class of gossip-generated overlay networks that are 
built from a random overlay by symmetry breaking using a 
preference function. Formally, a Gradient topology is defined 
as an overlay network where, for any two nodes p and q that 
have local utility values U{p) and U{q), if U{p) > U{q) 
then dist{p, r) < dist{q, r), where r is a (or the) node with 
highest utility in the system and dist{x, y) is the shortest 
path length between nodes x and y [11]. In the Gradient 
overlay, nodes have two preference functions that build two 
sets of neighbors: a similar view and a gradient view. For 
the similar view, nodes prefer neighbors with closer utility 
values, while for the gradient view, nodes prefer nodes with 
higher, but closer, utility values. Together these preference 
functions build a topology where gradient paths of increasing 
utilities emerge in the system [12], see figure [T] 

Our analysis of the Gradient overlay, involves proving 
that the preference functions cause the system topology to 
converge to a gradient structure. We also establish bounds 
on the worst-case convergence rate for a given initial graph. 
Finally, we show in simulations how the Gradient structure 
is used to build a more efficient Uve-streaming system than 
one built using uniform random peer sampling. 

II. Problem Setup 

Consider a network whose topology can be described 
by a directed graph Q{J\f,£). Each node in the network 
is represented by a vertex in the graph, and each link is 



represented by a directed edge (see figure 1(a) i. We denote 
the vertex set hy Af = {1, ■ . ■ ,N}, where each node i is 




Fig. 1. The network is described as a directed graph. The nodes are labeled with their respective utility value, and the edges from the similar neighbor 
set are shown. Solid edges are used between nodes with equal utility value, and dashed edges between nodes with different utility value. 



given a utility value U{i) € A from a given utility value set 
A = {!,..., n}. 

Let A„ = {i| U{i) ~ u} be the set consisting of nodes 
with utility u, u — 1, . . . ,n. Suppose |Ak| = rriu, where | • | 
represent the number of elements for a finite set. The utility 
distance function is denoted as d{i,j) = \U{i) — C/(j)|. 

The neighbor set iVi(t) of node i at time t consists of 
two parts, the similar view Nl{t) and the random view 
N[{t). Nodes in the similar view are supposed to be the 
neighbors whose utility values are close to U (i), while nodes 
in the random view are a random sample of the nodes in the 
network. 

Assumption 1: For every node i £ Af, if i e A„ then i has 
exactly to„ similar neighbors, excluding itself. 

III. Topology Dynamics 

For any given initial graph, consider the following algo- 
rithm for the topology dynamics: 
Algorithm 1. Let t = 1. 

Step 1. At time t, node i chooses a random neighbor j 
from node set M with equal probability, i.e., 

where p satisfies < Np < 1. Notice that the random 
neighbor set is empty with probabiUty 1 — Np, in which 
case we skip Step 2. 

Step 2. If the random node j is an improvement of the 
similar neighbor set, then we replace the worst node 
in iV/ with j. Thus, if U{j) > U{i) and d{i,j) < 
maxfcg^r. d{i, k), then add j to Nf and remove u = 
argmaxfcgjv' d{i,k) from iVf. 
Step 3. Let t = t + I, then go to Step 1. 

This paper considers the problem of whether the system 
topology will converge to a gradient structure with the 
proposed algorithm, and the convergence rate for a given 
initial graph. 



For every node i G Af, we define 

sgn(d(z,j)), 

jeAA/(t) 

where 

Jo, if i; = 
sgn(w) = <^ , . 

II, otherwise 

Thus, counts the number of nodes in is similar neigh- 
bor set with a different utility value than U{i). 

Let Q (t) be the graphs generated by Algorithm 1 . Then we 
give the definition of gradient convergence as follows (see 
also figure [TJ- 

Definition 3.1: Q{t) is said to converge to a gradient 
topology if limt^oo ATj*"* = 1 for i e Af, and with U{j) = 
U{i) + 1, where j is the only node with different utihty in 

limt^oo Nfit), for i e TV, U{i) < n. 

IV. Convergence Analysis 

In this section, we propose a gradient convergence analy- 
sis, where we focus on the first condition limt_j.oo — 1- 
The analysis can be extended to handle the second condition, 
with similar results. Since each node updates its neighbor 
set independently, the analysis on AT^*^ can be carried out 
respectively. Therefore, we let Xt represents X^'^, i e Af, 
in the following discussions to simplify the notations. 

Denote m = max„{m„}. Then it is not hard to see 
that Xq — m is the worst initial condition. In practice, the 
sampling probability p in Algorithm 1 can be time-varying, 
i.e., p = pt,t = 1,2, ... . Furthermore, for all t — 1,2,..., 
one has 

r {Xt+i = k\Xt = k + 1} = kpt (1) 

where kpt is the probability of sampling one of the k 
remaining nodes with the same utility value. 



A. Almost Sure Convergence 

We propose a both necessary and sufficient condition on 
the probabiUties pt for the convergence of Algorithm 1. 

Theorem 4.1: The graph generated by Algorithm 1 con- 
verges to a gradient topology (Xf = 1) with probability 1 if 
and only if 

T 

lim TT(l-p,) =0. (2) 

Before proving Theorem "|4. 1| let us take a closer look at 
Algorithm 1, and notice especially that the stochastic process 
([T]i for Xt has the Markov property, hence we can describe 
it as a Markov chain. 




1 -(m^Dp, 



1 - (m-2)P( 



Let Tr{t) denote the (row vector) probability distribution 
for the states Xt, i.e.. 



TT,{t)=V{Xt=i}. 

The evolution of TT{t) can be written in matrix form as 

TT{t+l)=7r{t)Pt, 

where Pt is the transition matrix at time t. 



(3) 



(4) 



Pt 



l-(m-l)pt (m-l)pt 







1— (m — 2)pt {m—2)pt 
l-(m-3)pt 



■■ 

1-Pt Pt 
1 



Since Pt is a triangular matrix, the eigenvalues are given by 
the diagonal elements, i.e., the eigenvalues of Pt are Xi{t) = 
1 — (to — i)pt, i = 1, . . . ,TO. Notice that \m{t) = 1, and 
all other eigenvalues are strictly less than one. Furthermore, 
all eigenvalues are distinct, hence the eigenvectors form a 
basis for M™. In the following lemma, we characterize the 
eigenvectors. 

Lemma 4.1: The eigenvector corresponding to 

eigenvalue \i{t) is independent of 7^ 0, i = 1, . . . , m. 

Proof: The (left-)eigenvectors of Pt satisfy Xi{t)^'^{t) ~ 
C{t)Pt- Let Cj{t) denote the j:th component of C{t), then 



(TO-l)pOeKO 

(TO-j)ft)C](t) + 
j = 2, . . . , TO 



(l-(TO-^)pt)e^(^)-(l 

(l-(TO-*)ft)e]W = (l 

(TO-j + i)p,e;-iW 

'(z-l)eKi)-0 

\Cjit) = o ifj<i 



, TO 



(5) 



while (t) can be chosen as an arbitrary non-zero value. ■ 
Lemma |4. 1| implies especially that all Pt are simultane- 
ously diagonalizable, hence we can drop the parameter t from 

e. 

Let us now return to the initial probability distribution 
7r(0), and let us express it in the eigenvector basis as 



(6) 



for some real numbers ai. 

Lemma 4.2: ttmC™ = ^m, where is the Cartesian unit 
vector [0, . . . , 0, 1, 0, . . . , 0]^ with 1 in position i. 

Proof: Let us consider for i = 1, . . . , to — 1. By 
equation (|5]l, 

m m m—i 

We will show by induction that 

k 

E ^ 

j=0 

The case when A; = is clearly true, thus, assume (|7]l holds 
for k and consider fc + 1, 

fc+i fe 

E ~ E ^ = 

m — i — k —{k + 1) 



— i — k 



(7) 



k 



i+k 



i+k + 1 



m ~ i m ~ i ~ k 
TO — i — (fc + 1) 



^i+k+l + 



+ k+l 



TO - I '^'+''+' 

Using (|7| implies that ^'1 = 0, i = 1, . . . , to— 1, and further, 
7r(0)l = Qto^'"!. Since 7r(0) is a probability distribution, 
we know that 7r(0)l = 1, but (|5]) tells us that only the last 
component of ^™ is non-zero, hence the lemma follows. ■ 
We are now ready to prove the main theorem. 

Proof: (Theorem |4.1| l The convergence condition is 
equivalent to limT->oo T^iT) — e^- Using Q and (|6]l gives 
us 



T-l 



T-l 



^(T)=7r(o)n^*=E"^^^n^*= 



t=0 
T-l 



i=l t=0 

m-1 T-l 



E n ^*(^) = E n ^'(^) + ^^"^ 

i=l t=Q i=l t=Q 

Consider the limit liniT^oo 7r(T), 



lim \tt{T) — e,„| — lim 



T-i. 



T-i-oo 
m— 1 



m— 1 



T-l 



E n 



< 



T-l 



i=l 



t=0 



Clearly, this converges to zero if limr-i-oo Yit^ai^^Pt) = 0. 

Also, the set of initial probability distributions spawns M™, 
thus, there exists an initial probability distribution 7r(0) such 



(a) Convergence in a network with 100 nodes, (b) Convergence is missing in a network with (c) Convergence in a network with 500 nodes, 
mu = 10, and constant probability Npt = 5 100 nodes, niu = 10, and decaying probability rriu = 50, and constant probability Npt = g 

Npt 



1 



(l+t/100)2 



Fig. 2. Convergence rate simulations. The neighbor set measurement Xt, for each node in the network, is shown as a function of the iteration number t. 



that ttm-i 7^ 0. Assume limT_j.oo nt=o(l ^ Pt) = c > 
(the limit exists, since it is a monotone bounded sequence), 
then 



lim |7r(r) — e„ 



m-2 



aiC ( ^lim^ Y\. ^^^^^ ] + "^"^'"-i^ 



m— 1 



t=0 



(9) 



Since the eigenvectors are Hnearly independent, the RHS of 
(|9| is non-zero. Thus, we have proved the theorem. ■ 
Corollary 4.1: The graph generated by Algorithm 1 con- 
verges to a gradient topology with probability 1 if and only 
if 

T 

lim ^y^^Pt = 00. (10) 



Proof: This follows from Theorem |4.1[ and the relation 

T T 

lim I \ (1 — pt) — ^ lim > pt — 00 



t=0 

for < Pt < 1. 



t=o 



B. Convergence Rate Estimation 

In this subsection, we investigate the convergence rate of 
Xt, with a constant sampling probability pt = p. Define 

T, = inf{Xt = 1 I Xo = i} 

as the first time when Xt reaches 1, when starting with 
Xq = i. Further, let Mi = £[1^] denote the expected time 
of convergence. Clearly Mi = 0, and for i = 2, . . . , m we 
have 



M,, 



1 + V{Xt+i =i-l\Xt^i}- Af,_i 

+ V{Xt+i =i\Xt^i}-M, 
1 + (i - l)pM,_i + l)p)Mi 



(i - l)p 



{i - l)p 



Continuing by induction yields 

p ^ — ' n 

n—l 

The worst initial case is when Xq = m, where the 
expected convergence time is 



p ^-^ n p 

n—l 



(11) 



Remark 4.1: Notice that Mm is the expected time for an 
individual node to converge, and not the expected time for all 
the nodes in the network to converge to a gradient topology. 

V. Convergence simulation 

In this section, we examine the convergence of Algo- 
rithm 1 with numerical examples. In all examples, the utility 
value set consists of ten distinct values, A = {1, . . . , 10}. 
In the first two simulation (figure |2(a)| and figure |2(b)| i the 
number of nodes of each utility value is = 10, and for 
the second simulation (figure 2(c) 1 to,; = 50. Thus, the total 



number of nodes in the network is = 100 and N = 500 
respectively. 

The similar view A^f(O) is initialized with to„ nodes 
uniformly chosen among all nodes in the network. In the 
first and third simulation the sampling probability pt is held 
at a constant value of . Hence, for each node, and at each 
iteration of the algorit hm, the random view is empty with 
Theorem ' 



probability 



4.1 



guarantees the convergence of the 
algorithm for these examples, which is also confirmed by the 
simulations. These two simulations should also be compared 
to the expected convergence rate given by equation ( fTTj ), 566 
and 4479 iterations respectively. 

In the second simulation (figure |2(b)| i, we also analyze 
a decaying probability pt = ^ fi +t/i oo W- Notice that 

there is a positive 



4.1 



St^o ^Pt ^ l*-*^' hsnce, by Corollary 
probability that the algorithm does not converge to a gradient 
topology. This is also confirmed by the simulation, in which 
the gradient topology is missing. 



VI. Live-streaming using the Gradient - 

EXPERIMENTS 

Here, we evaluate the effect of sampling from the Gradient 
overlay compared to a random overlay when building a P2P 
live-streaming application called GLive. GLive is based on 
nodes cooperating to share a media stream supplied by a 
source node. GLive uses an approximate auction algorithm 
to match nodes that are willing and able to share the stream 
with one another GLive extends our previous work on tree- 
based live-streaming, gradienTv [13] and Sepidar [9], to 
mesh-based live-streaming. 

Nodes want to establish connections to other nodes that are 
as close as possible to the source. They bid for connections 
to the best neighbours using the upload bandwidth they 
contribute as money. Nodes share their bounded number 
of connections with nodes who bid the highest (contribute 
the most upload bandwidth). Auctions are continuous and 
restarted on failures or free-riding. The desired affect of our 
auction algorithm is that the source will upload to nodes 
who contribute the most upload bandwidth, who will, in turn, 
upload to nodes who contribute the next highest amount of 
bandwidth, and so on until the topology is fully constructed. 
More details on our approximate assignment algorithm can 
be found in [9]. 

One of the main problems with the lack of global in- 
formation about nodes' upload bandwidths is that it affects 
the rate of convergence of auction algorithm. Nodes would 
ideally like to bid for connections to other nodes who they 
can afford to connect to, rather than win a connection to 
a better node and be later removed because a better bid 
was received. The traditional way to discover nodes (to 
bid on) is using a uniform random peer-sampling service 
[5]. Instead, we use the Gradient overlay to sample nodes, 
where a node's utility value is the upload bandwidth it 
contributes to the system. As such, the Gradient should 
provide other nodes with references to nodes who have well- 
matched upload bandwidths. In [9], we showed that using 
the Gradient overlay reduced the rate of parent switching for 
tree-based live-streaming by 20% compared to random peer 
sampling. Here, we show for GLive the effect of sampling 
neighbours using random peer sampling (GLive/Random) 
versus sampling from the Gradient overlay (GLive/Gradient). 

We implemented GLive using Kompics' discrete event 
simulator that provides different bandwidth, latency and 
churn models. In our experimental setup, we set the stream- 
ing rate to 512Kbps, which is divided into blocks of 16Kb. 
Nodes start playing the media after buffering it for 5 seconds. 
The size of similar- view in GLive is 15 nodes. In the auction 
algorithm, nodes have 8 download connections. To model 
upload bandwidth, we assume that each upload connection 
has available bandwidth of GAKbps and that the number 
of upload connections for nodes is set to 2i, where i 
is picked randomly from the range 1 to 10. This means 
that nodes have upload bandwidth between 128Kbps and 
1.25Mbps. As the average upload bandwidth of 704Kbps 
is not much higher than the streaming rate of 512Kbps, 



nodes have to find good matches as parents in order for good 
streaming performance. The media source is a single node 
with 40 upload connections, providing five times the upload 
bandwidth of the stream rate. We assume 11 utility levels, 
such that nodes contributing the same amount of upload 
bandwidth are located at the same utility level. Latencies 
between nodes are modeled using a latency map based on 
the King data-set [14]. We assume the size of sliding window 
for downloading is 32 blocks, such that the first 16 blocks 
are considered as the in-order set and the next 16 blocks are 
the blocks in the rare set. A block is chosen for download 
from the in-order set with 90% probability, and from the rare 
set with 10% probability. In the experiments, we measure the 
following metrics: 

1) Playback continuity: the percentage of blocks that a 
node received before their playback time. We consider 
the case where nodes have a playback continuity of 
greater than 99%; 

2) Playback latency: the difference in seconds between 
the playback point of a node and the playback point 
at the media source. 

We compare the playback continuity and playback latency 
of GLive/Gradient and GLive/Random in the following sce- 
narios: 

1) Churn: 500 nodes join the system following a Poisson 
distribution with an average inter-arrival time of 100 
milliseconds, and then till the end of the simulations 
nodes join and fail continuously following the same 
distribution with an average inter-arrival time of 1000 
milliseconds; 

2) Flash crowd: first, 100 nodes join the system following 
a Poisson distribution with an average inter-arrival time 
of 100 milliseconds. Then, 1000 nodes join following 
the same distribution with a shortened average inter- 
arrival time of 10 milliseconds; 

3) Catastrophic failure: 1000 nodes join the system fol- 
lowing a Poisson distribution with an average inter- 
arrival time of 100 milliseconds. Then, 500 existing 
nodes fail following a Poisson distribution with an 
average inter-arrival time 10 milliseconds; 

Figures [3] shows the percentage of the nodes that have 
playback continuity of at least 99%. We see that all the nodes 
in GLive/Gradient receive at least 99% of all the blocks very 
quickly in all scenarios, while it takes slightly more time 
for GLive/Random. That is because nodes in GLive/Random 
randomly sample nodes to run the auction algorithm against, 
while GLive/Gradient runs the auction algorithm against 
nodes that contribute similar amounts of upload bandwidth. 
Random sampling takes longer time to find good matches 
for delivering the stream. One point to note is that the 5 
seconds of buffering cause the spike in playback continuity 
at the start, which then drops off as nodes are joining the 
system. To summarize, using the Gradient overlay instead 
of random sampling produces better performance when the 
system is undergoing large changes - such as large numbers 
of nodes joining, failing over a short period of time. Figure 




Fig. 3. Playback continuity of tiie systems in different scenarios. 
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Fig. 4. Playback latency of the Gradient versus Random sampling in different scenarios. 



[4] shows the playback latency of the systems in the different 
scenarios. As we can see, although there is only a small 
difference between the systems, although, GLive/Gradient 
consistently maintains relatively shorter playback latency 
than GLive/Random for all experiments. The playback la- 
tency includes both the 5 seconds buffering time and the time 
required to pull the blocks over the live-streaming overlay 
constructed using the auction algorithm. 

VII. Conclusions 
In this paper, we introduced the topology convergence 
problem for the gossip-generated Gradient overlay network. 
We showed the necessary and sufficient conditions for con- 
vergence to a complete gradient structure We characterized 
the convergence time and provided bounds on the worst- 
case convergence time. Our experiments show the potential 
advantages of topologies built using preference functions. We 
showed how nodes can use implicit information captured 
in the Gradient topology to more efficiently find suitable 
neighbours compared to random sampling. As such, our work 
on proving convergence properties of the Gradient topol- 
ogy should have significance for other future information- 
carrying topologies. In future work, we will examine modi- 
fications to the topology construction algorithm that improve 
convergence time, as well as further applications of the 
topology in building P2P applications. 
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